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Abstract 

Model selection is an important activity in modern data analysis and the conven- 
tional Bayesian approach to this problem involves calculation of marginal likelihoods 
for different models, together with diagnostics which examine specific aspects of model 
fit. Calculating the marginal likelihood is a difficult computational problem. Our 
article proposes some extensions of the Laplace approximation for this task that are 
related to copula models and which are easy to apply. Variations which can be used 
both with and without simulation from the posterior distribution are considered, as 
well as use of the approximations with bridge sampling and in random effects models 
with a large number of latent variables. The use of a t-copula to obtain higher accu- 
racy when multivariate dependence is not well captured by a Gaussian copula is also 
discussed. 
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1 Introduction 



In Bayesian inference computation of marginal likelihoods is essential for calculating pos- 
terior model probabilities and Bayes factors, fundamental quantities for model comparison 
in the Bayesian framework. If Mi and M2 are two models to be compared with respective 
parameters 61 and 62, priors p{6i) and ^(^2) and likelihoods p{y\Oi) and p{y\02) where 
y = {yi, ...,yn)^ denotes the data then if p{Mi) and p{M2) denote the prior probabilities for 
Ml and M2 then the ratio of their respective posterior probabilities is 

pjMily) ^ pjMi) ^ p{y\Mi) 
p{M2\y) p{M2) ^ p{y\M2) 

where p{y\Mj) = J p{6j)p{y\0j)d0j is the marginal likelihood for model Mj and the second 
term on the right side above is called the Bayes factor comparing Mi to M2. 

There are many suggestions for how to calculate the marginal likelihood. One of the 
simplest methods is the Laplace approximation. We consider now a single model M with 
parameter 6 of dimension p, prior p{6) and likelihood p{y\0) with marginal likelihood 

p{y) = I p{e)p{y\e)do. (i) 

Suppressing dependence on y, write f{0) = p{0)p{y\0) and g{0) = log f{0). Let 6 be 
the mode of g{6) and H be the negative Hessian at the mode. The Laplace approximation 
approximates g{6) by g{0)~l/2{6 — OY H{6 — 0). Substituting this into ^ and integrating 
gives 

p{y) ^ {2nr/'\H\-'/'f{0) (2) 
where it can be shown that the error is of order 0{n~^). The right side of ([2]) is commonly 
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used for approximating the marginal likelihood in Bayesian inference but there has also been 
much interest in the use of the Laplace approximation for calculation of posterior moments 
in Bayesian applications (Tierney and Kadane, 1986). O'Hagan and Forster (2004), Chapter 
9, is a good summary of these apphcations and associated theory. 

There are many other suggested methods for computing the marginal likelihood which are 
simulation based. Raftery et al. (2007) and Newton and Raftery (1994) consider approaches 
based on using the so-called harmonic mean identity, an extension of which was discussed by 
Gelfand and Dey (1994). Such an approach can be unstable, although Raftery et al. (2007) 
suggest some possible solutions. Averaging the likelihood over parameters simulated from 
the prior is another possibility that directly uses the definition ([1]), but since the prior is 
overdispersed with respect to the likelihood this can be very inefficient requiring very large 
sample sizes. Several Markov chain Monte Carlo (MCMC) methods attempt to sample on 
the model and parameter space jointly. Carhn and Chib (1995) suggest an approach based on 
simulating on a product space. However this approach is hard to apply with a large number 
of models and requires choice of some tuning parameters that make the method unsuited to 
routine use. Green (1995) extends the Metropolis-Hastings algorithm to situations involving 
model uncertainty and his trans-dimensional MCMC method is the method of choice when 
there is a large number of models to be compared. However, devising MCMC moves to jump 
between different models in this framework is an art that requires problem specific insight. 
Several methods for calculating the marginal likelihood make use of the identity 

_ p{e)piy\e) 
''^''^ ~ P{e\y) 

which holds for any value of by a rearrangement of Bayes' rule. To use this identity, 
simply observe that the numerator is easy to calculate at any so that if we are able 
to estimate the posterior distribution p{0\y) at some point (usually some estimate of 
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the mode) then we immediately have an estimate of the marginal likelihood. The idea of 
estimating the marginal likelihood in this way is attributed to Julian Besag by Raftery 
(1996). Chib (1995) considered use of this identity in the case where a Gibbs' sampling 
MCMC algorithm is employed, although when the Gibbs updates consist of many blocks 
several different runs are needed to get the required density estimate. Although there is a 
way of avoiding multiple runs, this may not work well in high dimensions (see Chib and 
Jehazkov, 2001, for further discussion). Chib and Jeliazkov (2001) extended the method of 
Chib (1995) to the case of a general Metropolis-Hastings algorithm, although again if the 
Metropolis-Hastings scheme updates the parameters in many different blocks the method 
can be tedious to apply. Mir a and NichoUs (2004) show that the method of Chib and 
Jehazkov (2001) is a special case of the bridge estimator of Meng and Wong (1996) in the 
way that it estimates certain conditional densities and they also discuss optimality of the 
bridge sampling implementation, de Valpine (2008) also considers several further refinements 
of the approach. Gelman and Meng (1998) extend the bridge estimator of Meng and Wong 
(1986) to an approach they call path sampling ~ it is related in statistics to a method for high- 
dimensional integration discussed by Ogata (1989) and to previous work in statistical physics. 
Implementing path sampling can be quite computationally intensive, involving either several 
MCMC runs for different target distributions or an MCMC run over a joint distribution 
including an auxihary variable. Priel and Pettitt (2008) provide one recent approach to the 
implementation of path sampling. Another recent novel approach to marginal likelihood 
calculation is given by Skilling (2006) although implementation of this approach involves 
possibly difficult simulations from constrained distributions. Han and Carlin (2000) give a 
survey of MCMC based methods for computing the marginal likelihood and suggest methods 
based on separate MCMC runs for different models such as those of Chib (1995) and Chib 
and Jehazkov (2001) as being easiest to use when the number of models to be compared is 
small. 
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We restrict attention in what follows to methods which are easily employed given a 
simulated sample from the posterior distribution. However, we also consider an extension of 
the Laplace approximation which does not require simulation. Ways of combining simulation 
and the Laplace approximation for computing Bayes factors were considered in DiCiccio et 
al. (1997). They recommended for routine use a volume corrected version of the Laplace 
approximation, or for higher accuracy and where evaluations of the likelihood are inexpensive 
a Laplace approximation approach to attaining a near optimal implementation of bridge 
sampling. We discuss this last method later as the copula approximations we introduce here 
can be improved in much the same way. 

In Section 2 we discuss generalizing the Laplace approximation by approximating the 
posterior distribution with a Gaussian copula. Density estimation using the Gaussian copula 
is sensible since in many cases we expect the posterior distribution to be close to normal, so 
that approximating the posterior with a flexible class of densities which contains the Gaussian 
as a special case is attractive. We also discuss generalizations where the Gaussian copula is 
replaced by a t-copula. Copula approximations to posterior distributions have not been used 
very much for Bayesian computation - a notable exception is Reichert et al. (2002) who 
considered their use in importance sampling schemes. In Section 3 we consider different ways 
to estimate the marginal distributions in the copula approximation, both with and without 
simulation output. Section 4 discusses the Laplace bridge estimator which uses an initial 
estimate of the marginal likelihood obtained by Laplace approximation in implementation of 
bridge sampling. A similar estimator using our copula framework is then considered. Section 
5 considers performance of our methods in some simulated examples. Section 6 considers an 
example involving logistic regression. Section 7 considers a random effects heteroscedastic 
probit model for clustered binary data with a large number of latent variables and Section 
8 concludes. The copula approximations we describe work well across the whole range of 
examples we consider, both real and simulated. 
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2 A copula Laplace approximation 

A Gaussian copula distribution for a continuous random vector 6 = {9i, 9p)'^ is constructed 
from given marginal distributions Fj{6j) for 6j, j = l,...,p and a correlation matrix for a 
latent Gaussian random vector. In particular, suppose Z ~ A^(0, A) where A is a correlation 
matrix. Then if 9j = F~^{^{Zj)) then 9j has distribution Fj since is uniform and 

transforming a uniform random variable by the inverse of Fj gives a random variable with 
distribution function Fj. Note that while the 9j have given marginal distributions Fj, they 
are are also correlated due to the correlation between the components of Z. For background 
on Gaussian copula and copula models more generally see Joe (1997). The density function 
of is (see, for example. Song, 2000) 




where r] = rj{6) = (rji, ...,rip)^ with rjj = ^~^{Fj{9j)) and fj is the density function corre- 
sponding to Fj. 

Now suppose we are able to obtain a copula approximation to a posterior distribution 
p{d\y) for a parameter 6, of the form (jl]). We discuss how to obtain such an approximation 
both with and without simulation later. Then given an estimate of the mode of p{6\y) we 
can employ the identity ([3]) and our copula density estimate at 6 to obtain the estimate 

I A|i/2 exp (-lr]{9f{I - \-^)r){e)) 

p{y) ^ p{O)p{y\0) ^ , , . , -■ (5) 

where A is the correlation matrix and fj{9j), j = 1, ...,p are the marginal densities in our 
copula approximation. If 6 is the componentwise posterior median then r]{6) = and we 
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obtain 

This estimate reduces to the ordinary Laplace approximation if we consider the special case 
where our Gaussian copula is a multivariate normal density estimate with mean the posterior 
mode and covariance matrix given by the inverse of the negative Hessian of the log posterior 
at the mode. 



3 Estimating the marginals 

To apply the approximation (jSj) we need a Gaussian copula approximation to the posterior 
distribution. We consider both analytic and simulation based methods for obtaining this, as 
well as an extension of the Gaussian copula approach which uses t-copula. 

3.1 Analytic approach 

Write, as in Section 1, f{6) = p{6)p{y\0), g{0) = \ogf{6) and H = —g"{0) for the matrix 
of negative second order partial derivatives of g{6) evaluated at the mode 6. Decompose 
H as DCD^ where C is a correlation matrix and D = diaug(dj) and H ^ = SAS^ where 
A is a correlation matrix and S = diag(sj). Now consider a Gaussian copula density as 
an approximation to the posterior distribution, where the approximation to the marginal 
posterior distribution for 6j is 

and is a p-vector of zeros but with a one in the jth position and the copula correlation 
matrix is A. 
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The intuition behind this density estimate is as follows. We estimate the marginal for 
9j by considering a slice through the function f[0) with values of ^j, i ^ j, fixed at their 
modal values (this is the function f{6 + {9j — Oj)ej) and then overdisperse by raising this 
function to a power and normalizing. Note that if f{0) is proportional to a multivariate 
Gaussian, then f{6 + {9j — 9j)ej) is proportional to the conditional density of dj given that 
the other components are fixed at their modal values. This conditional distribution has as 
its mean the unconditional mean of 9j, and the variance Then raising this function 

to the power of and normalizing maintains the mean while changing the variance 

from to which is the unconditional variance for 9j (in the multivariate Gaussian 
case). So this operation gives the correct marginal distribution when f{d) is proportional 
to a multivariate Gaussian. The approximation to the marginal is also exact in the case of 
independence (where d'js'j = 1 and A = I). Choice of the copula correlation matrix as A 
is also made to ensure that the approximation to the joint posterior is exact in the case of 
p{6\y) being multivariate normal. The copula approximation is of interest in itself apart 
from the application to computing marginal likelihoods. In particular, for a Gaussian copula 
expectations for low-dimensional marginal distributions are easily calculated. For instance, 
suppose that we want to approximate for the jth component 9j of 6 the posterior expectation 
E{h{9j)\y). Then this is easily obtained from our copula approximation as 

j h{9j)gj{9j)d9j 

where gj{9j) is the marginal for 9j. This expression is easily evaluated with one- dimensional 
numerical integration. The approximation is exact both in the Gaussian case and in the case 
where components of the posterior are independent and it seems preferable to the simple 
normal approximation. 
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3.2 Simulation based approach 

An alternative and more accurate approach to approximating the posterior distribution by 
a Gaussian copula involves using a simulation based method. Suppose that we have a 
sample 0^^\...,6^^^ from the posterior distribution p{6\y) obtained by some method such 
as MCMC Consider the estimate where consists of the componentwise median. We 
estimate the quantities fj{Oj) using kernel density estimates based on the simulation output. 
We note that Hsiao et al. (2004) have considered multivariate kernel density estimation 
in conjunction with the formula ([3]) for estimating the marginal likelihood but clearly this 
approach is limited to fairly low dimensional situations. It only remains to specify how we 
obtain the correlation matrix A in our copula approximation to the posterior. 

Let Tjk be the rank of among the values 6^^\ i = l,...,s. Define the p- vector Z''--'^ 
to have kth component Z^^^ = ^^^{{rjk — 0.5) /s) where $ denotes the standard normal 
distribution function. We obtain A as the estimated correlation matrix of Z^^\ Z^'^\ 
which we obtain by the robust method of Rosseuw and Van Zomeren (1990) rather than using 
the sample correlation matrix, similar to Di Ciccio et al. (1997) in their implementation 
of a simulation based Laplace approximation. Roughly speaking, the above construction 
estimates the marginal distribution for a copula with the empirical distribution function 
and then transforms to the latent Gaussian variables assumed in the copula construction to 
obtain an estimate of the copula correlation. 

We can extend our Gaussian copula approximation to a t-copula. Laplace-type approxi- 
mations using the multivariate t- distribution have been considered previously (Leonard, Hsu 
and Ritter, 1994). A t-copula distribution with u degrees of freedom for a continuous random 
vector = {9i, ...,9p)^ with marginals Fi(6'i), Fp{6p) has density 
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where fkiT, 0, A, u) is the /c-dimensional multivariate ^-density with mean 0, scale A and 
degrees of freedom u, ri{6) = {rji, ...,rip)'^ with rjj = i]j{Oj) = F^^{Fj{9j);0,l,h')) where 
Fk{ri;0, A,i/) is the distribution function for fk{r];0, A,!/), and fj{Oj) is the density for 
Fj{9j). If we have an approximation to the posterior distribution of this form we can again 
obtain an estimate of the marginal likelihood based on the estimated posterior density at 
some value 0. Taking again d as the componentwise median, we obtain 



IlUfM) r(^)r(|) 



p-i- 



Of course, as z/ — > oo this reduces to our former approximation based on the Gaussian copula. 
Once more we need some simple way to estimate A and in this case u from simulation output 
to apply this formula. Let rjk be the rank of 6^^^ among the values i — l,...,s. For 
fixed degrees of freedom u let Tjf^ = Ff ^((r^-fe -0.5)/s; 0, 1, u), T^^^ = {Ti^\ 7^^^ and let 
A(i/) be obtained as the maximum likelihood estimator of the correlation matrix assuming 
the T^-'^ are independent and identically distributed from a multivariate t-distribution with 
mean 0, scale A and degrees of freedom z/. On the copula scale one can consider the data R^-''* 
with R^^^ = {vjk - 0.5) /s and assuming the R^^' are independent and identically distributed 
from the density 

U{F^\n; 0, 1, ly), F-\rp- 0, 1, u)- 0, A{v), v) 

Y[%,h{F-\n-o,i,v);0,i,v) 

obtain a maximum likelihood estimator for v numerically by a grid search. 



4 Laplace bridge estimator 

DiCiccio et al. (1997) find that combining Laplace approximation and the bridge estimator 
of Meng and Wong (1996) is very effective in improving accuracy. In its most general form 
the bridge estimator can estimate a ratio of marginal likelihoods but here we just consider 
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a special case where interest centres on calculation of a single marginal likelihood. We want 
to calculate the normalizing constant (marginal likelihood) p{y) in p{6\y) oc p{0)p{y\6). 
Suppose we have some density r{0) (where in this discussion there is no unknown normalizing 
constant for r) and let t{0) be any function of 6 such that 



< 



t{0)r{e)p{e)p{y\e)do 



< oo. 



Then it is easily shown that 

J p{6)p{y\0)t{6)r{0)de 



p{y) 



jr{e)t{e)p{e\y)de 



If we have a sample 6^^\ ...,0^^'^ from p(0\y) and a sample 6^ \ ...,0^ ^ from r(0), then we 
have 

7E;=i*(e"')'-(e'") ■ 

Meng and Wong (1996) show that the optimal choice of the function t{6) is 



p{y) 

Actually, ([7]) is the optimal choice when the generated samples from both p{0\y) and r{0) 
are independent. For the samples from p{6\y) which are usually generated via MCMC this 
is usually not the case and an adjustment could be made to account for the typically positive 
correlation, although we have not done this here. Note also that ([7]) involves p{y), which 
is unknown. It is possible to implement an iterative version of bridge sampling where p{y) 
is successively refined in ([7]) (Meng and Wong, 1996). The Laplace bridge estimator simply 
uses the Laplace approximation to estimate p{y) in ([7]) for the purpose of determining a t{0) 
for implementation of bridge sampling, and uses the usual normal approximation for r{6). 
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We can similarly suggest estimating p{y) with our Gaussian copula approach, and using our 
Gaussian copula approximation to the posterior for r{6) with kernel density estimates based 
on the simulation output for the marginals. Note that simulating directly from a Gaussian 
copula is straightforward. 

5 Simulation studies 

To evaluate the accuracy of our methods we consider their use for calculating the normalizing 
constant for some known density functions. Of course, the normalizing constant is known to 
be one here so accuracy of the approximations is easily assessed. We consider the multivariate 
skew t distribution of Branco and Dey (2001). This is a convenient distribution to use since it 
can accommodate both skewness and heavy tails, it is easy to simulate from non-iteratively 
and its density function is easy to calculate. Branco and Dey (2001), p. 105, consider a 
generalized multivariate skew t distribution but here we just consider the special case of 
their multivariate skew t. For Y = {Yi, Yk)^ the density is 

V Vl - 5^K-^5 \ v+{y- m)^A \y-fi) J 

where as before fk{y'-, fJ-, A, u) is the fc-dimensional multivariate t distribution with mean fj,, 
scale matrix A and u degrees of freedom, Fk{y; /i. A, u) is the corresponding density function, 
and d = {6i, 6k)'^ is a vector of skewness parameters. Obviously with (5 = we obtain the 
ordinary multivariate t distribution. For our simulations we choose /x = 0, A = / and S of 
the form {6i,0, 0)^. Choosing A and 6 in this way means that only the first component 
of Y is skewed, with Si controlling skewness, 6i > giving positive skewness and 6i < 
giving negative skewness. The random vector Y with density ([8]) may be constructed in 
the following way. Let Z = [Xq X'^]'^ be a (A; + l)-dimensional multivariate t distributed 



12 



random vector with Xq a scalar and X a A;- vector, 

where /i* and A* are partitioned in the same way as Z. Then Y has the distribution of 
X\Xq > 0. Note that this construction gives a simple non-iterative way of simulating from 
this distribution. 

Our simulations investigated the effects of heavy tails (through u), skewness (through 
Si) and dimensionality on the accuracy of the method we have discussed. In particular, 
for each of our methods we considered every combination of z/ = 3, 10, 6i = 0, 0.5, 0.99 
and k = 2,5, 10 dimensions. The methods we compare are the ordinary Laplace approxi- 
mation (LI), a Laplace approximation where we use the componentwise posterior median 
and minimum volume ellipsoid covariance estimation method of Rosseuw and Van Zomeren 
(1990) from simulation output rather than the mode and negative inverse Hessian (L2), our 
copula Laplace approximation without simulation (CLl), our copula approximation with 
simulation (CL2), the t-copula approximation (TC), the Laplace bridge estimator (LB) and 
the copula bridge estimator (CLB). For the methods based on simulation, we used 10,000 
replications. For method L2, the use of the componentwise median and minimum volume 
ellipsoid methods for estimating the mean and covariance were discussed in DiCiccio et al. 
(1997) and found to work well in high dimensions. For the two variants of bridge estimation 
we also used 10,000 simulations from r{d), and for the copula bridge estimator we estimated 
the marginals using a kernel density estimator (we used the default implementation of the 
density function in the stats package of R, R core development team, 2005). For the 
simulation based methods we report average values obtained over 50 simulation replicates, 
with the standard deviation over replicates in brackets. Of course there are ways to approx- 
imate standard errors based on a single replicate (de Valpine, 2008, for example) but we 
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have chosen not to do this here for the purposes of our simulation studies. In practice such 
methods are of course very important. In our tables we have reported the estimated value 
of the log of the normalizing constant (true value 0) rather than the normalizing constant 
itself. The results are shown in Tables 1-3. The main conclusions which emerge are that the 
copula approximations improve over the respective Laplace approximations for the variants 
both with and without simulation. Generally performance of all methods deteriorates with 
higher dimension and heavier tails, as might be expected. Perhaps not intuitively for some 
of the methods performance improves with increasing skewness. Both variants of bridge es- 
timation work well, but the copula bridge method seems to improve over the Laplace bridge 
method in the 10- dimensional case with skewness, with a smaller standard deviation over 
replicates. The t-copula works extremely well, but perhaps this is not surprising given that 
the test function is constructed from a generalization of the multivariate t-distribution. In 
the t-copula, the estimate of the degrees of freedom was chosen from a grid including integer 
values 1 to 10, 15, 20 and 50. In our later real examples the t-copula approximation fares 
less well than in these simulations. 



6 Low birthweight example 

For a real example we consider calculation of marginal likelihoods for model comparison 
in a regression with binary response. In particular, we consider the low birth weight data 
reported by Hosmer and Lemeshow (1989) which are concerned with 189 births at a US 
hospital in a study where it was desired to found out which predictors of low birthweight 
were important in the hospital where the study was carried out. The binary response is 
an indicator for birthweight being less than 2.5 kg. After transforming the predictors as 
described in Venables and Ripley (2002) there are ten covariates, two continuous and eight 
binary predictors. These covariates are shown in Table HI For our analysis of the data, we 
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consider generalized linear models with logit and robit links (with three degrees of freedom 
for the robit link), using the default prior for logistic regression given by Gelman et al. (2008) 
on coefficients for both the choices of link function. See Gelman and Hill (2007) pp. 124- 
125 for a brief introduction to robit regression. Venables and Ripley (2002) consider model 
selection for this example and the logit link using a stepwise approach. They also consider 
the inclusion of second order interaction terms. The final model they choose includes all the 
predictors in Table H] as main effects except for the indicators for well as interaction 

terms age*ftvl, age*ftv2+ and smoke*ui. Here we consider a direct comparison of the 
model including all the original predictors as main effects with the final model of Venables 
and Ripley (2002) for both the logit and robit links. Note that the comparisons between 
links here are non-nested and not easily done via traditional hypothesis testing approaches. 
Venables and Ripley (2002) also consider examining the adequacy of their linear model with 
second order interactions by expanding to a generalized additive model with smooth terms 
for the covariates age, age*ftvl, age*ftv2+ and Iwt. We consider a similar model here, but 
we simply use second order polynomials for representing the additive smooth terms which 
should be adequate for the purposes of model checking. For these three different models and 
the two different choices of link function we calculated the log marginal likelihood using the 
same methods considered in our simulation study. We also considered the same comparisons 
for a random sample from the original data set of size n = 50 to show how the accuracy 
of the approximations is affected by sample size. For generating MCMC iterates we used a 
Metropolis-Hastings scheme with normal random walk proposal with the covariance based 
on the Hessian of the log posterior at the mode. The results of these comparisons are shown 
in Tables [5] and [61 Also shown in the table is a "gold standard" value (GS) for each case 
obtained by Laplace bridge with s = S = 100, 000. Using this value for comparison, we 
obtain a similar picture of the performance of the respective methods to that obtained from 
the simulation study. All methods are remarkably accurate for the full data set. In the small 
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sample setting of the randomly chosen subset, the copula approximations improve over their 
simpler Laplace approximation variants, and bridge sampling works well with the copula 
approach showing less variability than the Laplace bridge for the highest-dimensional case. 
For the i-copula, we estimated the degrees of freedom choosing from a grid of integer values 
where the maximum value is 50 - generally the largest value of 50 was chosen in nearly every 
case, so that performance is generally similar but slightly inferior to the Gaussian copula 
approximation. 

7 A high-dimensional example 

Our last example concerns a complex random effects model for a dataset concerned with 
stated preferences of Australian women on whether or not to have a papsmear test (Ficbig 
and Hall, 2005). There are 79 women in the study and each is presented with 32 different 
scenarios. The response is an indicator for whether the women would undertake a papsmear 
test so there are 32 repeated binary observations on each of the 79 women. We consider 
the following random effects heteroscedastic probit model which was considered in Gu et 
al. (2008) and analyzed using a Bayesian approach. Following the notation of Gu et al. 
(2008) and letting i = 1,...,79 index the different women or clusters, and j = 1,...,32 
index observations within clusters, the binary observation is considered to arise from a 
continuous latent variable y* by 




otherwise. 
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Similar latent variable formulations are often used in Bayesian analyses of simple probit 
models (Albert and Chib, 1993). The y*^ follow the model 

y*j = Xij(3 + + Vij 

where [li is a subject or cluster specific random effect, is a vector of covariates, /3 is an 
unknown vector of regression coefficients and vij ~ iV(0, with al^ = exp{wijd) where 
Wij is a vector of covariates (often Xij = Wij) and 5 is a vector of unknown coefficients. For 
identifiability an intercept should not be included in Wij. The covariates used to define 
and Wij in this example are shown in Table [71 Gu et al. (2008) use the following priors for 
/3, 5 and a^. First, 

(3\8^N{Q,cp{X^X)-^) 

where Cf^ is set to the total number of observations (32 x 79 here) and X = D{S)^^X where 
X = (asfi, 33^32, a^TQ 32)"'" and 

D{d) = diag I exp I — ^ I ,.-,exp I — - — 1 ,...,exp I — - — I 1 . 

Then d ~ A^(0, csl) and a^, cs ~ IG(a, h) independently where a = 1 + 10"^°, 6=1 + 10"^ 
and IG denotes the inverse gamma distribution. An efficient MCMC sampling scheme can 
be developed with /3 and = (/ii, /lyg)^ updated as a single block with a Gibbs sampling 
step, (5 udpated using a Metropolis-Hastings step and cr| and cs updated with Gibbs sampling 
steps. See Gu et al. (2008) for details. If we set (5 = in this model, this results in a 
homoscedastic random effects probit model and it is of some interest to compare this model 
with the full model. See Gu et al. (2008) for references and discussion. 

This is a challenging example because of the presence of the latent variables y*j and 
/X. Our approach effectively integrates out the latent variables which is important since 
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otherwise we obtain a very high-dimensional problem. We will apply the formula ([3]) for 
estimating the marginal likelihood with 6 = {S^ , f3^ , a'^, cs)^ . Note that it is difficult to 
apply bridge sampling with fj, integrated out as this requires evaluating p{y\6) for a large 
number of different values of 6. To apply our approach we need to be able to estimate p{y\6) 
at a single value 0* . As before, assume that we have an MCMC sample from p{0, y*, 
We can use for 0* = {6*\f3*' , crfi*, c|)^ the componentwise posterior median, say. Then to 
estimate p{y\B*) we can simulate values fi^^^ from p(/x|(T^*) and compute 

s ^-^ 

i=l 

To use (13]) to estimate p{y), it only remains to estimate p{6*\y). With our copula approach, 
we can do this directly by fitting a Gaussian copula model to the simulation output. We 
also consider a simple normal density estimate, which is similar to the Laplace-Metropolis 
estimator of Lewis and Raftery (1997). For comparison, we implement the computationally 
intensive but also more accurate method of Chib and Jeliazkov (2001) which can be applied 
with latent variable models such as the one considered here. Write 

p(r|2/) = p{d*\y)p{f3*\d*,y)p{al*\f3*,d*,y)p{c}\P\d*,al*,y). (9) 

In the present context, we use the approach of Chib and Jeliazkov (2001) to estimate each of 
the terms on the right hand side of (111). This requires separate runs for the different blocks 
of parameters in the decomposition (see Chib and Jeliakov, 2001, for further discussion). 
The terms for /3, o"^ and cs are relatively easily handled as full conditionals are available 
for these parameters, but d is updated by a Metropolis-Hastings step. It is of interest to 
see whether the rather tedious but accurate multiple runs approach of Chib and Jeliazkov 
(2001) results in very similar results to an approximation which requires less coding effort 
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and computation time. Table [8] shows the results for the approach of Chib and Jeliazkov 
(CJ), copula approximation (CL) and normal approximation (L). The log marginal likelihood 
is estimated for both heteroscedastic and homoscedastic models. 

The copula approximations work well for much less computational effort. The additional 
computational effort for the coupula approximation is essentially negligible once the MCMC 
run for the full model is obtained whereas CJ requires 2 additional reduced MCMC runs 
where first 6* and then f3* and S* are held fixed. In the table as before we report a mean 
and standard deviation (bracketed) across 50 simulation replicates for each of the methods. 
We also tried the t-copula approximation but this was similar but slightly inferior to the 
Gaussian copula approximation. 

We can envisage a role for our copula approximations in conjunction with the CJ approach 
and similar approaches in high- dimensional situations. One could use a copula approximation 
for some blocks of parameters in estimating the conditional distribution in (Q. When it is 
natural to use a large number of small blocks in the MCMC scheme the method of CJ may 
be very tedious to apply so grouping some small blocks together and applying a copula 
approximation while dealing with the remaining blocks using the CJ approach (for instance 
for blocks where the full conditional is available) is potentially attractive. The greater 
accuracy of the copula approximation compared to the normal approximation would allow 
the consideration of a larger number of smaller blocks. 

8 Discussion and Conclusions 

With large datasets becoming increasingly common in statistical applications there has been 
recent renewed interest in fast deterministic approximations like the Laplace approxima- 
tion as an alternative to Monte Carlo methods or to improve the implementation of Monte 
Carlo methods in certain problems. Among the methods we have considered, the copula 
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approximations are the ones that work well across the whole range of real and simulated 
examples that we have discussed, and the copula methods usually improve on their simpler 
Laplace type variants both with and without simulation. We believe our methods have great 
potential to be used both by themselves, in combination with other methods, and even in 
conjunction with MCMC algorithms where there is a need for better proposal distributions. 
Investigation of these applications is continuing. 
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Table 1: Methods LI, L2, CLl, CL2, TC, LB and CLB applied to integrate multivariate 
skew t densities in 2 dimensions with 3 and 10 degrees of freedom and zero, moderate and 
extreme skewness. The estimated log of the integral of the density (true value 0) is reported. 



Degrees of 


Method 


Skewness 


freedom 




(5i = 


5i = 0.5 


5i = 0.99 


3 


LI 


-0.51 


-0.51 


-0.60 




L2 


0.09 (0.02) 


0.07 (0.02) 


-0.30 (0.02) 




CLl 


-0.16 


-0.17 


-0.17 




CL2 


0.20 (0.02) 


0.18 (0.03) 


0.09 (0.03) 




TC 


0.04 (0.03) 


0.03 (0.02) 


-0.01 (0.02) 




LB 


0.00 (0.01) 


0.00 (0.01) 


0.00 (0.01) 




CLB 


0.00 (0.00) 


0.00 (0.00) 


0.00 (0.01) 


10 


LI 


-0.18 


-0.18 


-0.34 




L2 


-0.03 (0.01) 


-0.04 (0.02) 


-0.27 (0.02) 




CLl 


-0.05 


-0.05 


-0.05 




CL2 


0.07 (0.03) 


0.07 (0.03) 


0.04 (0.03) 




TC 


0.02 (0.03) 


0.02 (0.03) 


0.01 (0.03) 




LB 


0.00 (0.00) 


0.00 (0.00) 


0.00 (0.01) 




CLB 


0.00 (0.00) 


0.00 (0.00) 


0.00 (0.00) 
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Table 2: Methods LI, L2, CLl, CL2, TC, LB and CLB applied to integrate multivariate 
skew t densities in 5 dimensions with 3 and 10 degrees of freedom and zero, moderate and 
extreme skewness. The estimated log of the integral of the density (true value 0) is reported. 



Degrees of 


Method 


Skewness 


freedom 




(5i = 


5i = 0.5 


5i = 0.99 


3 


LI 


-1.55 


-1.55 


-1.69 




L2 


1.08 (0.03) 


1.02 (0.03) 


0.50 (0.04) 




CLl 


-1.04 


-1.05 


-1.06 




CL2 


0.66 (0.04) 


0.60 (0.05) 


0.32 (0.06) 




TC 


0.08 (0.05) 


0.04 (0.05) 


-0.01 (0.06) 




LB 


0.00 (0.01) 


0.00 (0.01) 


0.00 (0.02) 




CLB 


0.00 (0.01) 


0.00 (0.01) 


0.00 (0.01) 


10 


LI 


-0.68 


-0.68 


-0.85 




L2 


-0.31 (0.03) 


0.30 (0.03) 


0.08 (0.04) 




CLl 


-0.42 


-0.42 


-0.42 




CL2 


0.18 (0.05) 


0.16 (0.04) 


0.08 (0.05) 




TC 


0.06 (0.04) 


0.05 (0.05) 


-0.04 (0.07) 




LB 


0.00 (0.01) 


0.00 (0.01) 


0.00 (0.01) 




CLB 


0.00 (0.00) 


0.00 (0.00) 


0.00 (0.00) 
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Table 3: Methods LI, L2, CLl, CL2, TC, LB and CLB applied to integrate multivariate 
skew t densities in 10 dimensions with 3 and 10 degrees of freedom and zero, moderate and 
extreme skewness. The estimated log of the integral of the density (true value 0) is reported. 



Degrees of 


Method 


Skewness 


freedom 




5i = 


5i = 0.5 


5i = 0.99 


3 


LI 


-3.58 


-3.58 


-3.74 




L2 


3.89 (0.07) 


3.72 (0.08) 


2.89 (0.06) 




LLl 


-2.97 


-2.97 


-2.98 




CL2 


2.91 (0.08) 


2.76 (0.08) 


2.15 (0.08) 




TC 


0.16 (0.07) 


0.03 (0.08) 


-0.48 (0.34) 




LB 


0.00 (0.03) 


0.00 (0.03) 


0.00 (0.04) 




CLB 


0.00 (0.01) 


0.00 (0.02) 


0.01 (0.01) 


10 


LI 


-1.89 


-1.89 


-2.07 




L2 


1.51 (0.04) 


1.48 (0.04) 


1.19 (0.04) 




CLl 


-1.50 


-1.50 


-1.50 




CL2 


1.18 (0.09) 


1.13 (0.08) 


0.97 (0.06) 




TC 


0.10 (0.07) 


0.08 (0.06) 


-0.12 (0.05) 




LB 


0.00 (0.01) 


0.00 (0.01) 


0.00 (0.02) 




CLB 


0.00 (0.00) 


0.00 (0.00) 


0.00 (0.00) 



Table 4: Predictors for low birth weights data set 



Predictor 


Description 


age 


age of mother in years 


Iwt 


weight of mother (lbs) at least menstrual period 


raceblack 


indicator for race=black (0/1) 


raceother 


indicator for race other than white or black (0/1) 


smoke 


smoking status during pregnancy (0/1) 


ptd 


previous premature labors (0/1) 


ht 


history of hypertension (0/1) 


ui 


has uterine irritability (0/1) 


ftvl 


indicator for one physician visit in first trimester (0/1) 


ftv2+ 


indicator for two or more physician visits in first trimester (0/1) 
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Table 5: Approximations to log marginal likelihoods for models Mq (linear model with all 
original predictors and no interactions), Mi (interaction model of Venables and Ripley) and 
M2 (model with additive terms for continuous covariates) for low birthweight example. 



Link function 


Method 


Model 






Mo 


Ml 


M2 


Logistic 


LI 


-124.3 


-120.0 


-122.4 




L2 


-124.4 (0.2) 


-120.1 (0.3) 


-122.5 (0.3) 




CLl 


-124.2 


-119.8 


-122.1 




CL2 


-124.3 (0.3) 


-120.0 (0.5) 


-122.4 (0.5) 




TC 


-124.6 (0.4) 


-120.4 (0.4) 


-123.1 (0.5) 




LB 


-124.1 (0.0) 


-119.7 (0.0) 


-122.0 (0.1) 




CLB 


-124.3 (0.0) 


-120.0 (0.0) 


-122.5 (0.1) 




GS 


-124.1 


-119.7 


-122.0 


Robit 


LI 


-132.9 


-128.1 


-131.4 




L2 


-132.2 (0.2) 


-127.2 (0.2) 


-129.9 (0.3) 




CLl 


-132.7 


-127.8 


-129.6 




CL2 


-132.6 (0.3) 


-127.8 (0.5) 


-130.7 (0.5) 




TC 


-132.9 (0.4) 


-127.9 (0.3) 


-131.1 (0.5) 




LB 


-132.4 (0.0) 


-127.4 (0.1) 


-130.3 (0.1) 




CLB 


-132.7 (0.0) 


-127.7 (0.0) 


-130.8 (0.1) 




GS 


-132.5 


-127.5 


-130.3 
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Table 6: Approximations to log marginal likelihoods for models Mq (linear model with all 
original predictors and no interactions), Mi (interaction model of Venables and Ripley) and 
M2 (model with additive terms for continuous covariates) for randomly chosen subset of size 
50 for low birthweight example. 



Link function 


Method 


Model 






Mo 


Ml 


M2 


Logistic 


LI 


-37.9 


-38.1 


-38.6 




L2 


-35.7 (0.4) 


-34.8 (1.0) 


-33.2 (0.6) 




CLl 


-36.9 


-36.7 


-35.7 




CL2 


-36.8 (0.3) 


-36.5 (0.7) 


-35.9 (0.4) 




TC 


-36.9 (0.3) 


-36.6 (0.3) 


-36.5 (0.9) 




LB 


-36.6 (0.2) 


-36.0 (0.4) 


-35.2 (0.6) 




CLB 


-36.9 (0.2) 


-36.5 (0.1) 


-36.0 (0.3) 




GS 


-36.6 


-36.1 


-35.4 


Robit 


LI 


-43.2 


-43.1 


-43.1 




L2 


-39.3 (0.4) 


-38.3 (0.9) 


-36.8 (0.9) 




CLl 


-39.9 


-38.5 


-37.0 




CL2 


-4L1 (0.3) 


-40.6 (0.6) 


-39.8 (0.6) 




TC 


-41.5 (0.5) 


-40.8 (0.8) 


-40.1 (0.7) 




LB 


-40.8 (0.3) 


-40.1 (0.7) 


-39.0 (0.9) 




CLB 


-41.2 (0.4) 


-40.6 (0.2) 


-39.9 (0.4) 




GS 


-40.7 


-40.3 


-39.5 



Predictor 



Table 7: Predictors for papsmear data set 
Description 



knowgp 1 if the GP is known to the patient; otherwise 

sexgp 1 if the GP is male; otherwise 

testdue 1 if the patient is due or overdue for a paptest; otherwise 

drrec 1 if the GP recommends that the patient has a paptest; otherwise 

papcost cost of test in Australian dollars 
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Table 8: Approximations to log marginal likelihoods for heteroscedastic and homoscedastic 
models for the papsmear data. 



Method 


Model 




Heteroscedastic Homoscedastic 


L 


-1101.8 (0.5) -1119.1 (0.2) 


CL 


-1101.4 (0.5) -1118.9 (0.2) 


CJ 


-1101.5 (0.5) -1118.9 (0.2) 
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